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DETAILED ACTION 

1. This Office Action is in response to the Arguments filed on 12/21/2007. Claims 1- 
15, and 20 remain pending and have been examined. The Applicants' remarks have 
been carefully considered, but they are not persuasive and do not place the claims in 
condition for allowance. Accordingly, this action has been made FINAL. 

2. All previous objections and rejections directed to the Applicant's disclosure and 
claims not discussed in this Office Action have been withdrawn by the Examiner. 

Response to Arguments 

1 . Applicant's arguments, see page 6, filed on 12/21/2007 with respect to the 
rejection(s) of claim(s) 1 , 9, and 14 under Tackin (US 7,180,892) in view of Smith et al. 
(US 6,862,298) have been fully considered and are not persuasive. The Applicant 
argues that the limitations of "determining an end to said voice information based on 
said measurement and a delay interval; and adjusting said delay interval to correspond 
to an average packet delay time" is not taught in the latter cited reference. The 
Examiner traverses this argument. In the former part of the limitation, the determination 
of an end to voice information based on said measurements and a delay interval is 
taught by the primary reference by Tackin. Specifically, in col. 25, lines 39-44. It is 
obvious that a voice activity detector will detect the start and end of speech as it detects 
periods of speech and non-speech. The use of a delay interval is taught in col. 13, lines 
19-21 and col. 14, lines 1-6. The jitter buffer adds a delay for packets that are not 
arriving on time and adjust the time (see col. 35, lines 22-25, lines 26-44, and lines 23- 
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31 ). Since the jitter buffer compensates for delayed packets. Based on this delay the 
end of voice is affected as additional packets are repeated or due to misconception of a 
silence period. Hence, the compensation by the jitter buffer affects the actual 
determination of the ending of voice for preventing delay on the onset of next packet. 
The latter limitation of adjusting said delay interval to correspond to an average packet 
delay time is taught by Smith et al. In the Abstract, col. 2, lines 44-50 an average packet 
delay time is used to adjust delay interval. It is noted by the Applicant that the Jitter 
Manager uses the packet delay to manage the size of the jitter buffer. Although this is 
true, the jitter buffer manager also controls the speed through which each packet arrives 
(see Smith et al., col. 5, lines 25-27). The utilization or altering of the speed as depicted 
depends on the buffer size (see Smith, col. 6, lines 32-47). The average packed delay is 
used to measure the variation in packet arrival. Hence, the next packet arriving depends 
on the jitter size as well as the speed. Thus, the end of voice as claimed in the limitation 
would be obvious since each packet is arriving at a specific speed, which influences the 
ending time of speech by filling in or augmenting the speech information (see Smith, col. 

4, lines 9-13). The Smith et al. reference presents a method for estimating the average 
packet delay. Therefore, as denoted above, all of the limitations have been taught by 
the combination of Tackin in view of Smith et al. 

Claim Rejections - 35 USC § 103 

5. The following is a quotation of 35 U.S.C. 1 03(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 
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(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

6. Claim 1, 5, 8, 9, 13, 14, 19, and 20 rejected under 35 U.S.C. 103(a) as being 

unpatentable over Tackin (US 7,180,892) in view of Smith et al. (US 6,862,298). 

As to claims 1 and 14, Tackin teaches a method, comprising: 

receiving a plurality of packets (see col. 13, lines 1-3) with audio 
information (see Abstract) (e.g. Applicant defines audio information to include 
voice and silence (see page 4, [0006], lines 3-5). Audio packets are retrieved.); 

determining whether said audio information represents voice information 
(see col. 12 lines 4-9) (e.g. The determination of the audio information is found 
by the voice activity detector); and 

buffering said audio information in a jitter buffer (see Figure 6, elements 
86 and 90 and col. 13, lines 66-67-col. 14. lines 1-3 and figure 25, element 510) 
after said determination (see col. 13, lines 18-27). The reference also teaches a 
computer readable storage medium for the above limitations (see col. 2, lines 45- 
51) (e.g. Audio information is buffered.). 

wherein said determining comprises: 

receiving frames of audio information at a voice activity detector (see col. 
12, lines 4-5) (e.g. It is shown by the reference that audio information (voice) is 
received by the voice activity detector); 
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measuring at least one characteristic (see col. 25, lines 39-44) of said 
frames (see col. 2, line 10 )(e.g. It is shown that frames are used as the timing 
input of the signal containing information); 

determining a start of voice information based on said measurements (see 
col. 25, lines 39-44) (e.g. VAD is used to speech. It is obvious that the start and 
end of speech is found through a VAS as is known in the art ); and 

determining an end to said voice information based on said 
measurements (see col. 25, lines 39-44) (e.g. It is obvious that a voice activity 
detector will detect the start and end of speech as it detects periods of speech 
and non-speech) and a delay interval (see col. 13, lines 19-21 and col. 14, lines 
1-6) (e.g. The jitter buffer adds a delay for packets that are not arriving on time 
and adjust the time (see col. 35, lines 22-25, lines 26-44, and lines 23-31 )The 
applicant regards the delay time being calculated from the jitter buffer (see 
Applicant's Specification, page 17, [0038], lines 1-6.) 

However, Tackin does not specifically teach the adjusting of the delay 
interval based on an average packet delay time. 

Smith et al. teaches the adjusting said delay interval to correspond to an 
average packet delay time (see Abstract and col. 2, lines 44-50) (e.g. An average 
packet delay time is used to adjust delay interval). 

It would have been obvious to one of ordinary skilled in the art at the time 
the invention was made to have modified the voice based packet network as 
taught by Tackin with the use of a delay based on the average packet delay time 
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as taught by Smith et al.. The motivation to have combined the two references 
involves the improvement in audio quality by having smaller delays during high 
network quality and increasing delay during poor network conditions, (see Smith 
et al. col. 2, lines 17-23). 

As to claim 5, Tackin in view of Smith et al. teaches all of the limitations as in 
claim 1, above. 

Furthermore, Tackin teaches said characteristic comprises an estimate of 
an energy level for said frame (see col. 25, lines 29-38) (e.g. An energy level is 
used to determine if speech is present.). 

As to claims 7 and 19, Tackin in view of Smith et al. teaches all of the 
limitations as in claim 1, above. 

measuring an average packet delay time by said jitter buffer (see Smith et 
al, Abstract and col. 2, lines 44-50 and lines 30-33) (An average packet delay 
time is calculated and compares to a reference delay. A variation parameter is 
measure and then the delay is adjusted.) 

sending said average packet delay time (see Smith et al, Abstract and 
col. 2, lines 44-50) to said voice activity detector (see Tackin, Figure 6, 
elements 90 and 98) (e.g. It is evident from the diagram that from the voice 
synchronizer it proceeds down to the voice decoder and lost frame recovery 
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engine, which then proceeds to the voice activity detector (see col. 13, lines 66- 
col. 14, line 4). The voice synchronizer is used to adjust based on the delay.) 

As to claims 8 and 20, Tackin in view of Smith et al. teaches all of the limitations 
as in claim 1 , above. 

Furthermore, Tackin teaches retrieving a frame(see col. 12, lines 60-63) 
(e.g. It is implied by the reference that frames of audio is used.) of audio 
information from said packets (see Figure, 6, element 60a) (e.g. Audio 
information in the form of voice is received, which has undergone pulse code 
modulation); 

receiving an echo cancellation reference signal (see Figure 6, output of 
element 108 to input if element 70 and col. 10 lines 66-67-col. 1 1 , lines 2-4) (e.g. 
It is evident for the echo canceller a reference signal is needed that is free from 
echo to compare with the incoming signal); 

canceling echo from said frame of audio information (see col. 10, lines 66- 
67, col. 11, lines 1-5) (e.g. The input signal contains voice and noise information); 
and 

sending said frame of audio information to a voice activity detector (see 
Figure 6, output of element 70 to input to element 72 to input of element 80) (e.g. 
A VAD is used and the audio information is sent to determine speech.). 



As to claim 9, Tackin teaches a system comprising: 
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an antenna (e.g. It is inherent that digital phones consist of built-in 
antenna as well as a receiver for hearing audio information and transmitter for 
transmitting information. ); 

a receiver connected to said antenna to receive a frame of information 
(e.g. It is inherent that digital phones consist of built-in antenna as well as a 
receiver for hearing audio information and transmitter for transmitting 
information.) 

a voice activity detector to detect voice information in said frame (see col. 
12, lines 4-5); and 

a jitter buffer to buffer said information after said detection by said voice 
activity detector buffer (see Figure 6, elements 80, 86 and 90 and col. 13, lines 
66-67-col. 14, lines 1-3 and figure 25, element 510). Further, Tackin teaches the 
use of digital phones (see col. 6, lines 13-14. It is seen that once the data has 
been encoded and VAD has been performed (see col. 1 1 , lines 40-49) the 
decoding process utilizes the jitter buffer). 

However, Tackin does not specifically teach the adjusting of the delay 
interval based on an average packet delay time. 

Smith et al. teaches the adjusting said delay interval to correspond to an 
average packet delay time (see Abstract and col. 2, lines 44-50) (e.g. The delay 
is adjusted based on the average packet delay time). 

It would have been obvious to one of ordinary skilled in the art at the time 
the invention was made to have modified the voice based packet network as 
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taught by Tackin with the use of a delay based on the average packet delay time 
as taught by Smith et a/.. The motivation to have combined the two references 
involves the improvement in audio quality by having smaller delays during high 
network quality and increasing delay during poor network conditions, (see Smith 
etal. col. 2, lines 17-23). 

As to claim 13, Tackin in view of Smith teaches all of the limitations as in claim 9, 

above. 

Furthermore, Tackin teaches said voice activity detector further comprises 

an estimator to estimate energy level values (see col. 25, lines 29-36) 
(e.g. Energy levels are estimated.); 

. a voice classification module connected to said estimator to classify 
information for said frame (see col. 25, lines 29-36) (e.g. It is evident by the 
reference that once the energy level is found classification occurs.) 

6. Claims 2, 3, 12, 15, and 16 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Tackin in view of Smith et al. as applied to claims 1 , 9, and 14 
above, in view of Clemm (US 6,865,162). 

As to claims 2 and 15, Tackin in view of Smith et al. teach a voice based packet 
network. 
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However, Tackin in view of Smith et a/, et al. does not specifically teach 
the buffering of a portion of said audio information in a pre-buffer for a 
predetermined time interval. 

Clemm does teach the use of a buffer (see col. 2, line 31) for a 
predetermined time (see col. 2, lines 31-33) prior to said determining (see Figure 
1, elements 110 and 120 and col. 2, lines 30-37) (e.g. A pre-buffer is used.). 

It would have been obvious to one of ordinary skilled in the art at the time 
the invention was made to have modified the voice based packet network as 
taught by Tackin and Smith et al. with the buffer before the voice activity detector 
as taught by Clemm. The motivation to have combined the two references 
involve the elimination of clipping associated with voice activity detector directed 
during silence suppression (see Clemm col. 2, lines 47-48) as would have been 
seen in the teachings of Tackin. 

As to claims 3 and 16, Tackin in view of Smith teaches all of the limitations as in 
claims 1 and 13, above. 

Furthermore, Tackin teaches sending said information from the jitter buffer 
to an end user (see Figure 6, output of element 60b) (e.g. The applicant denotes 
the endpoint to be defined as the human user (see Applicant's Specification, 
page 8, [0018], lines 5-6 Hence, it is obvious that the output of the decode signal 
will be sent since the reference deals with data exchange (see abstract) . 
Further, it is implied that the output of the system will be transmitted to the user 
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since the reference deals with voice exchange). (Further, the sending of audio 
information to the user from the pre-buffer would have been apparent with the 
teaching presented by Clemm to avoid clipping). 



As to claim 12, Tackin in view of Smith ef a/, teach all of the limitations as in 
claim 9. 

Furthermore, Tackin in view of Smith ef al. teach a voice packet based 
network. 

However, Tackin in view of Smith ef al. do not specifically teach the 
buffering of a portion of said audio information in a pre-buffer for a predetermined 
time interval. 

Clemm teaches further comprising a buffer to store pre-threshold speech 
during detection by voice activity detector (see Figure 1, elements 110 and 120 
and col. 2, lines 30-37) (The reference buffers a pre-threshold speech based 
upon two values, from a delay.) 

It would have been obvious to one of ordinary skilled in the art at the time 
the invention was made to have modified the voice based packet network as 
taught by Tackin and Smith ef a/, with the buffer before the voice activity detector 
as taught by Clemm. The motivation to have combined the two references 
involve the elimination of clipping associated with voice activity detector directed 
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during silence suppression (see Clemm ,col. 2, lines 47-48) as would have been 
seen in the teachings of Tackin. 

7. Claims 10 and 1 1 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Tackin in view of Smith et al. as applied to claim 9 above, and further in view of 
Sih et al. (US 5,920,834). 

As to claim 10, Tackin in view of Smith et al. teach all of the limitations as in 
claim 9. 

Furthermore, Tackin in view of Smith et al. teach a voice packet based 
network. 

However, Tackin in view of Smith et al. do not specifically teach the echo 
canceller connected to a receiver to cancel the echo. 

However, Sih et al. does teach the echo canceller being connected to a 
receiver (see Figure 1, elements 14 and 10) (e.g. It is evident that a transceiver 
consists of a receiver and a transmitter). 

It would have been obvious to one of ordinary skilled in the art at the time 
the invention was made to have the echo canceller connected to a receiver. The 
motivation to have combined the two references involves cancellation of echo for 
mobile phones that may occur in speech signals (e.g. see Sih et al., col. 23-25) 
as would have been apparent in the teachings of Tackin, which describes 
communication between telephony devices. 
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As to claim 1 1 , Tackin, Smith et a/., and Sih et al. teaches all of the limitations as 
in claim 9. 

Furthermore, Sih et al. teaches a transmitter (see Figure 1 , element 14) 
(e.g. Transceiver consists of a transmitter) to provide an echo cancellation signal 
to said echo canceller (see Figure 1, element 10 and col. 6, lines 14-18). 

Conclusion 

4. THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time 
policy as set forth in 37 CFR 1.136(a). 

A shortened statutory period for reply to this final action is set to expire THREE 
MONTHS from the mailing date of this action. In the event a first reply is filed within 
TWO MONTHS of the mailing date of this final action and the advisory action is not 
mailed until after the end of the THREE-MONTH shortened statutory period, then the 
shortened statutory period will expire on the date the advisory action is mailed, and any 
extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of 
the advisory action. In no event, however, will the statutory period for reply expire later 
than SIX MONTHS from the mailing date of this final action. 

8. Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Paras Shah whose telephone number is (571)270-1650. 
The examiner can normally be reached on MON.-THURS. 7:00a.m.-4:00p.m. EST. 



Application/Control Number: 10/722,038 



Page 14 



Art Unit: 2626 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Patrick Edouard can be reached on (571)272-7603. The fax phone number 
for the organization where this application or proceeding is assigned is 571-273-8300. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the automated information 
system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 



02/07/2008 
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